DataFrames¶

A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns. Features of DataFrame

  • Potentially columns are of different types
  • Size : Mutable
  • Labeled axes (rows and columns)
  • Can Perform Arithmetic operations on rows and columns

A pandas DataFrame can be created using the following constructor :

  • pandas.DataFrame( data, index, columns, dtype, copy)

Create DataFrame: A pandas DataFrame can be created using various inputs like :

  1. Lists, 2D List and List of a Tuple
  2. dict
  3. Series
  4. Numpy ndarrays

Empty DataFrame¶

In [1]:
import pandas as pd
df=pd.DataFrame()
print(df)
Empty DataFrame
Columns: []
Index: []

1. Create a DataFrame from List¶

The DataFrame can be created using a single list or a list of lists.

Series vs DataFrame¶

Using Series¶

In [2]:
import pandas as pd
data= [1,2,5,4,6]
Ser=pd.Series(data)
Ser
Out[2]:
0    1
1    2
2    5
3    4
4    6
dtype: int64

Using DataFrame¶

In [3]:
import pandas as pd
data= [1,2,5,4,6]
Ser=pd.DataFrame(data)
Ser
Out[3]:
0
0 1
1 2
2 5
3 4
4 6

Note: As you can seen in Series there is no column name but in DataFrame there is a default column starting from Zero(0)

Create a DataFrame From 2D List¶

In [4]:
data = [['Robin',26,45.34],['Karan',25,78.5],['Priya',23,87.67],['Varun',22,56],['Keisha',23,97]]
print(data)
[['Robin', 26, 45.34], ['Karan', 25, 78.5], ['Priya', 23, 87.67], ['Varun', 22, 56], ['Keisha', 23, 97]]
In [5]:
df=pd.DataFrame(data)
df
Out[5]:
0 1 2
0 Robin 26 45.34
1 Karan 25 78.50
2 Priya 23 87.67
3 Varun 22 56.00
4 Keisha 23 97.00

adding column names¶

In [6]:
df=pd.DataFrame(data,columns=['Name','Age','Marks'])
df
Out[6]:
Name Age Marks
0 Robin 26 45.34
1 Karan 25 78.50
2 Priya 23 87.67
3 Varun 22 56.00
4 Keisha 23 97.00

Create a DataFrame from List of a Tuple¶

In [7]:
data = [('Robin',26,45.34),('Karan',25,78.5),('Priya',23,87.67),('Varun',22,56),('Keisha',23,97)]
df=pd.DataFrame(data,columns=['Name','Age','Marks'])
df
Out[7]:
Name Age Marks
0 Robin 26 45.34
1 Karan 25 78.50
2 Priya 23 87.67
3 Varun 22 56.00
4 Keisha 23 97.00

2. Create a DataFrame from Dict¶

All the ndarrays must be of same length. If index is passed, then the length of the index should equal to the length of the arrays.

If no index is passed, then by default, index will be range(n), where n is the array length.

In [8]:
import pandas as pd
data = {'Name':['Ayush', 'Priya', 'Kapil', 'Rohit'],'Age':[28,21,29,42]}
df = pd.DataFrame(data)
df
Out[8]:
Name Age
0 Ayush 28
1 Priya 21
2 Kapil 29
3 Rohit 42

adding index¶

In [9]:
df = pd.DataFrame(data,	 index=['i1','i2','i3','i4'])
print(df)
     Name  Age
i1  Ayush   28
i2  Priya   21
i3  Kapil   29
i4  Rohit   42

Create a DataFrame from List of Dicts¶

List of Dictionaries can be passed as input data to create a DataFrame. The dictionary keys are by default taken as column names.

In [10]:
import pandas as pd
data = [{'a': 12, 'b': 32},{'a': 15, 'b': 50, 'c': 23},{'a': 65, 'b': 45, 'c': 19}]
df = pd.DataFrame(data)
df
Out[10]:
a b c
0 12 32 NaN
1 15 50 23.0
2 65 45 19.0
Note− Observe, NaN (Not a Number) is appended in missing areas.¶
In [11]:
df = pd.DataFrame(data, index=['First', 'Second','Third'])
df
Out[11]:
a b c
First 12 32 NaN
Second 15 50 23.0
Third 65 45 19.0

With two column indices, values same as dictionary keys¶

In [12]:
df1 = pd.DataFrame(data, index=['First', 'Second','Third'], columns=['a', 'b'])
df1
Out[12]:
a b
First 12 32
Second 15 50
Third 65 45

3. Create a DataFrame from Dict of series¶

A DataFrame can be created by passing a Dictionary of Series. The union of all the series indexes passed, is the resultant index.

In [13]:
import pandas as pd
data={'Col1': pd.Series([1,5,2,5,6],index=['a','b','c','d','e']), 'Col2': pd.Series([25,87,52,65,89],index=['a','b','c','d','e']) }
df=pd.DataFrame(data)
df
Out[13]:
Col1 Col2
a 1 25
b 5 87
c 2 52
d 5 65
e 6 89

4. Create a DataFrame from Numpy Array¶

In [14]:
data = [['Robin',26,45.34],['Karan',25,78.5],['Priya',23,87.67],['Varun',22,56],['Keisha',23,97]]
print(data)
[['Robin', 26, 45.34], ['Karan', 25, 78.5], ['Priya', 23, 87.67], ['Varun', 22, 56], ['Keisha', 23, 97]]
In [15]:
import numpy as np
Arr = np.array(data)
Arr
Out[15]:
array([['Robin', '26', '45.34'],
       ['Karan', '25', '78.5'],
       ['Priya', '23', '87.67'],
       ['Varun', '22', '56'],
       ['Keisha', '23', '97']], dtype='<U32')
In [16]:
df = pd.DataFrame(Arr)
df
Out[16]:
0 1 2
0 Robin 26 45.34
1 Karan 25 78.5
2 Priya 23 87.67
3 Varun 22 56
4 Keisha 23 97
In [17]:
df = pd.DataFrame(Arr,columns = ['Name','Age','Marks'])
df
Out[17]:
Name Age Marks
0 Robin 26 45.34
1 Karan 25 78.5
2 Priya 23 87.67
3 Varun 22 56
4 Keisha 23 97

5. Column Selection, Additon & Deletion¶

Selection¶

In [18]:
data = [['Robin',26,45.34],['Karan',25,78.5],['Priya',23,87.67],['Varun',22,56],['Keisha',23,97]]
print(data)
[['Robin', 26, 45.34], ['Karan', 25, 78.5], ['Priya', 23, 87.67], ['Varun', 22, 56], ['Keisha', 23, 97]]
In [19]:
df = pd.DataFrame(data,columns = ['Name','Age','Marks1'])
df
Out[19]:
Name Age Marks1
0 Robin 26 45.34
1 Karan 25 78.50
2 Priya 23 87.67
3 Varun 22 56.00
4 Keisha 23 97.00

Select a Single Column¶

In [20]:
df['Name']
Out[20]:
0     Robin
1     Karan
2     Priya
3     Varun
4    Keisha
Name: Name, dtype: object

or¶

In [21]:
df.Name
Out[21]:
0     Robin
1     Karan
2     Priya
3     Varun
4    Keisha
Name: Name, dtype: object

Addition¶

In [22]:
df
Out[22]:
Name Age Marks1
0 Robin 26 45.34
1 Karan 25 78.50
2 Priya 23 87.67
3 Varun 22 56.00
4 Keisha 23 97.00
In [23]:
df['Marks2'] = [78,56,98,45,66]
In [24]:
df['Roll No'] = [10,11,12,13,14]
In [25]:
df
Out[25]:
Name Age Marks1 Marks2 Roll No
0 Robin 26 45.34 78 10
1 Karan 25 78.50 56 11
2 Priya 23 87.67 98 12
3 Varun 22 56.00 45 13
4 Keisha 23 97.00 66 14

adding new column by adding values of column first and third¶

In [26]:
df['Total Marks']=df['Marks1']+df['Marks2']
df
Out[26]:
Name Age Marks1 Marks2 Roll No Total Marks
0 Robin 26 45.34 78 10 123.34
1 Karan 25 78.50 56 11 134.50
2 Priya 23 87.67 98 12 185.67
3 Varun 22 56.00 45 13 101.00
4 Keisha 23 97.00 66 14 163.00

Deletion¶

  • deleting column using del function.
In [27]:
del df['Roll No']
df
Out[27]:
Name Age Marks1 Marks2 Total Marks
0 Robin 26 45.34 78 123.34
1 Karan 25 78.50 56 134.50
2 Priya 23 87.67 98 185.67
3 Varun 22 56.00 45 101.00
4 Keisha 23 97.00 66 163.00

deleting column using pop function.¶

In [28]:
df.pop('Age')
df
Out[28]:
Name Marks1 Marks2 Total Marks
0 Robin 45.34 78 123.34
1 Karan 78.50 56 134.50
2 Priya 87.67 98 185.67
3 Varun 56.00 45 101.00
4 Keisha 97.00 66 163.00

Deletion of rows can be done by using drop() function.¶

In [29]:
df
Out[29]:
Name Marks1 Marks2 Total Marks
0 Robin 45.34 78 123.34
1 Karan 78.50 56 134.50
2 Priya 87.67 98 185.67
3 Varun 56.00 45 101.00
4 Keisha 97.00 66 163.00
In [30]:
df = df.drop(0)
df
Out[30]:
Name Marks1 Marks2 Total Marks
1 Karan 78.50 56 134.50
2 Priya 87.67 98 185.67
3 Varun 56.00 45 101.00
4 Keisha 97.00 66 163.00